FitNets: Hints for Thin Deep Nets
نویسندگان
چکیده
While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher network or ensemble of networks. In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Because the student intermediate hidden layer will generally be smaller than the teacher’s intermediate hidden layer, additional parameters are introduced to map the student hidden layer to the prediction of the teacher hidden layer. This allows one to train deeper students that can generalize better or run faster, a trade-off that is controlled by the chosen student capacity. For example, on CIFAR-10, a deep student network with almost 10.4 times less parameters outperforms a larger, state-of-the-art teacher network.
منابع مشابه
All you need is a good init
Layer-sequential unit-variance (LSUV) initialization – a simple method for weightinitialization for deep net learning – is proposed. The method consists of the twosteps. First, pre-initialize weights of each convolution or inner-product layer withorthonormal matrices. Second, proceed from the first to the final layer, normaliz-ing the variance of the output of each layer to be e...
متن کاملWhat is the Problem with Proof Nets for Classical Logic ? Lutz
This paper is an informal (and nonexhaustive) overview over some existing notions of proof nets for classical logic, and gives some hints why they might be considered to be unsatisfactory.
متن کاملGeneralization and Expressivity for Deep Nets
Along with the rapid development of deep learning in practice, theoretical explanations for its success become urgent. Generalization and expressivity are two widely used measurements to quantify theoretical behaviors of deep learning. The expressivity focuses on finding functions expressible by deep nets but cannot be approximated by shallow nets with the similar number of neurons. It usually ...
متن کاملSome Preliminary Hints on Formalizing UML with Object Petri Nets
Petri nets have already been used to formalize UML and they have already shown – at least partially – what can be done in terms of analysis and simulation. Nevertheless “conventional” Petri nets, like P/T nets and color nets, are not always enough to efficiently formalize the behavior associated with UML models when specifications heavily rely on typical object-oriented features, like inheritan...
متن کاملLearning a Skill-Teaching Curriculum with Dynamic Bayes Nets
We propose an intelligent tutoring system that constructs a curriculum of hints and problems in order to teach a student skills with a rich dependency structure. We provide a template for building a multi-layered Dynamic Bayes Net to model this problem and describe how to learn the parameters of the model from data. Planning with the DBN then produces a teaching policy for the given domain. We ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1412.6550 شماره
صفحات -
تاریخ انتشار 2014